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Abstract 

Inference of causality is central in nonlinear time series analysis and science 
in general. A popular approach to infer causality between two processes is 
to measure the information flow between them in terms of transfer entropy. 
Using dynamics of coupled oscillator networks, we show that although trans¬ 
fer entropy can successfully detect information flow in two processes, it often 
results in erroneous identification of network connections under the presence 
of indirect interactions, dominance of neighbors, or anticipatory couplings. 
Such effects are found to be profound for time-dependent networks. To over¬ 
come these limitations, we develop a measure called causation entropy, and 
show that its application can lead to reliable identification of true couplings. 

Keywords: causality inference, causation entropy, coupled oscillator 
networks, blinking couplings 


1. Introduction 

The long-standing puzzle of “what causes what”, formally known as the 
problem of causality inference, is challenging yet central in science jT; 2j- Un¬ 
derstanding causal relationship between events has important implications in 
a wide range of areas including as examples social perception [3|, epidemi¬ 
ology and econometrics |5j. It is the reliable inference of causality that 
allows one to untangle complex causal interactions, make predictions, and 
ultimately design intervention strategies. 
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Traditional approach of inferring causality between two stochastic pro¬ 
cesses is to perform the Granger causality test [6]. A main limitation of this 
test is that it can only provide information about linear dependence between 
two processes, and therefore fails to capture intrinsic nonlinearities that are 
common in real-world systems. To overcome this difficulty, Schreiber de¬ 
veloped the concept of transfer entropy between two processes [7J. Transfer 
entropy measures the uncertainty reduction in inferring the future state of a 
process by learning the (current and past) states of another process. Being 
an asymmetric measure by design, transfer entropy is often used to infer the 
directionality of information flow and further the causality between two pro¬ 
cesses 0i- Recently, it becomes increasingly popular to use transfer entropy 
for causality inference in networks of neurons mm and in coupled dynami¬ 
cal systems with parameter mismatches pa, anticipatory couplings [13| , and 
time delays [H]. However, despite the overwhelming number of proposed 
applications, a clear interpretation of the resulting relationship inferred by 
transfer entropy is lacking. 

In this paper, we study information transfer in the dynamics of small- 
scale coupled oscillators networks. We show by several examples that causal 
relationship inferred by transfer entropy are often misleading when the un¬ 
derlying system contains indirect connections, dominance of neighboring dy¬ 
namics, or anticipatory couplings. To account for these effects, we develop 
a measure called causation entropy (CSE), and show that its appropriate 
application reveals true coupling structures of the underlying dynamics. 

2. Information Theory and Dynamical Systems 

In this section we introduce the mathematical tools used in this study, 
which include elements from both dynamical systems and information theory. 

2.1. Dynamical System as a Stochastic Process 

Our focus of this paper is on discrete dynamical systems of the form 

x t +i = f(x t ), (1) 

where Xt E T> C M m is the state variable and / : V — > D is the dynamic rule 
of the system. A trajectory (or orbit) {x t } of Eq. ([!]) naturally represents a 
time series. For a continuous dynamical system x = /(x), a time series can 
be obtained by sampling its continuous trajectory at discrete time points. 
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The time points are often chosen to spread uniformly in time or to be the 
times instances at which the trajectory intersects a given manifold that is 
transversal to the trajectory, called a Poincare section ra- 

A natural bridge between dynamical systems and information theory is 
the formulation of symbolic dynamics, which requires discretization of the 
phase space. In particular, a finite topological partition P = {P\,..,P m } of 
the phase space T> is a collection of pairwise disjoint sets in T> whose union 
is V [16] . Defining the associated set of symbols = {1,2,..., m}, one can 
transform a trajectory {ay} into a symbolic sequence {s t }, where s t is defined 

by PH HE] 

x t E Pi C V =>- s t = i G D. (2) 

Viewing fl as the sample space, the symbolic sequence {s t } can be seen 
as a time series of a stochastic process. Define a probability measure over 
the partition P, as 

/i : P —y M. (3) 

If /i is invariant under the dynamics, then HM 

Prob(s t = i) = n{i), V i 6 O, f 6 R. (4) 

A partition P is called a Markov partition if it gives rise to a stochastic 
process that is Markovian, i.e., future states of the process depends only on 
its current state, and not the past states mm- 

2.2. Information-Theoretical Measures: Entropy, Mutual Information and 
Transfer Entropy 

Consider a discrete random variable X whose probability mass function 
is denoted by p(x) = Prob(X — x). To quantify the unpredictability of X, 
one can calculate its (information) entropy, defined as 

H{X) = ~ J2p( x ) 1o S P(x), (5) 

X 

where by convention, we use “log” to represent “log 2 ”. In general, H(X) 
approximates the minimal binary description length L of the random variable 
X, with the following inequality m- 

H(X) <L< H(X) + 1. (6) 
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It follows that, among all random variables with c elements, the one with 
uniform distribution yields the maximum entropy, log(c). 

Consider now two random variables A" and F with joint distribution 

p(x, y) = Prob(X = x, Y = y), (7) 

and conditional distribution 

p{x\y) = Prob(X = x\Y = y). (8) 

The joint entropy H(X,Y) and conditional entropy 77(X|F) for X and Y 
are defined, respectively, as 

X, Y) = - P(x, y) log p{x, y) , (9) 

x,y 


and 


H(X\Y) = -Y,p{y)H{Y\ x = x) = ~^p{x,y) log p(x\y). (10) 

y X,y 

Similar dehnition holds for 77(F|X). 

It is easy to verify that conditioning reduces entropy, i.e., knowledge of 
Y will reduce (or at least cannot increase) the uncertainty about A", i.e., 

H(X\Y) < H(X). (11) 


Similarly, H(Y\X) < H(Y). 

The reduction of uncertainty of X (Y) given full information about Y 
(A) can be measured by the mutual information between A" and Y, as [2T]J 

/(A; Y) = 77(A) - H(X\Y) = 77(F) - H(Y |A). (12) 

The mutual information is symmetric in A" and Y, and measures their devi¬ 
ation from independence: if A and Y are fully dependent, then H(X\Y) = 
H(Y\X) = 0 and thus 7(A;F) = 77(A) = 77(F); on the other hand, if X 
and F are independent, then 77(A|F) = 77(A) and 77(F|X) = 77(F) and 
therefore 7(X; F) = 0. In general, we have j21j 

0 < 7(A; F) < min[77(A), 77(F)], (13) 
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It is convenient to visualize the relationship between entropy, joint en¬ 
tropy, conditional entropy, and mutual information by a Venn-like diagram, 
as shown in Fig. [l|a). 

We now turn to stochastic processes. For a stationary process {Ah}, its 
entropy rate H(X) can be defined as 

H(X) = lim H(X t \X t -i, X t _ 2 , ■ ■ •, W), (14) 

t—> oo 

which can be thought of as the (asymptotic) growth rate of the joint entropy 
H(Xi,X 2: ..., Ah). If the process is Markovian, then [2TJ 

H{X) = lim H(X t \X t _i). (15) 

t—¥ OO 

For two stochastic processes {X t } and {Y t }, the reduction of uncertainty 
about X t+ i due to the information of the past ry states of Y, represented by 

y t (7V) = (y t ,y t _ 1 ,...,y t _ v+1 ), (ie) 

in addition to the information of the past Tx states of A", represented by 

X! Tx) = (Ah, Ah_!,..., X t _ Tx+1 ), (17) 

is measured by the transfer entropy from Y to X, defined as [7j 

T y ^ x = H{X t+1 \X[ TX) ) - H(X t+1 \X[ TX \ Y t {TY) ). (18) 

One can similarly define Tx^y, which does not necessarily equal to Ty->x- 
Note that T Y ^x can also be interpreted as the mutual information between 
Ah + i and Y^ ^ conditioned on x[ Tx \ In this paper, we focus on the case 
where 

Tx = r Y = 1, (19) 

unless specified otherwise. The relationship between transfer entropy, en¬ 
tropy and conditional entropy are illustrated in Fig. [jjb). 

3. Measuring Information Transfer in Two Coupled Oscillators 

Coupled oscillator networks are commonly used for modeling the dynamic 
behavior of complex systems in various areas [231 EH [251126]. Here we consider 
discrete dynamics of coupled oscillator networks, in the form 

x th = fl x ?] +£^ c ijg[xt\ x t\ i = l,2,...,N. (20) 

3+i 
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Here x G P C is the state of oscillator i at time t, f : P —» P is the 
dynamics of individual oscillators, j:PxPgP is the coupling function, 
and e is the coupling strength. Term represents the coupling from j to i. 
In this paper, we use 

f{x) = ax(l — x ) (21) 

with parameter a = 4. The coupling function is chosen to be 


s(x,y) = f(y) - f(x). 


( 22 ) 


The choice of e G [0,1] and normalization condition 


£• 


v 


= 1 


guarantees that 


x 


(0 


G P = [0,1] for all i and t. 


(23) 


(24) 


We first explore information transfer in two coupled oscillators, with bidi¬ 
rectional and unidirectional couplings, respectively. With a slight abuse of 
notation, we use X and Y to represent oscillators 1 and 2. In terms of 
Eq. (20), the bidirectional coupling corresponds to having c 12 = c 2 1 = 1 
and unidirectional coupling corresponds to C\ 2 = 1, c 2 i = 0. Results from 
numerical simulation are shown in Fig. [2] 

One direct observation is that mutual information can be used as a mea¬ 
sure of synchrony between two oscillators X and Y. When X and Y are 
synchronized, their mutual information 


I(X-Y) = H(X) = H(Y). (25) 

When they are not synchronized, 

I(X ; Y) < min [H(X),H(Y)\. (26) 


We remark that this observation suggests a new and alternative way of mea¬ 
suring generalized synchronization or synchronization among a partial set of 
nodes in a large-scale network |29l [30j. 

For bidirectionally coupled oscillators, synchrony occurs when the cou¬ 
pling strength [31] 

e G (0.25,0.75), (27) 
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as shown in Fig. [2^a). The mutual information reaches its maximum for the 
same range of e. Similarly, synchronization and maximum mutual informa¬ 
tion both occur when 

ee (0.5,1], (28) 

in the case where X and Y are uni directionally coupled [Fig. Sc)]. Fig- 
ure[2](b,d) show typical time series of the bidirectionally and unidirectionally 
coupled oscillators with e = 0.1 (unsynchronized trajectories) and e = 0.6 
(synchronized trajectories), respectively. 

When two oscillators synchronize, the transfer entropy from either one of 
them to the other becomes zero because no extra information can be gained 
by learning the past trajectory of the other oscillator (in addition to that from 
one’s own). As a consequence, detection of coupling by transfer entropy (or 
any other measure) is valid only when the oscillators are not synchronized. 
Oscillators that are synchronized produce identical trajectories and therefore 
appear indistinguishable. 

When the two oscillators are not synchronized, there is a positive transfer 
entropy following the directionality of coupling. For bidirectionally coupled 
oscillators, 

7W = T y ^x > 0 if e G (0, 0.25) U (0.75,1], (29) 

except for a few parameters at which the trajectories of X and Y settle into 
a periodic orbit [Fig. |2](a)] . For unidirectionally coupled oscillators, positive 
transfer entropy Tx-*y is observed when 

7W > T y ^x = 0 if e G (0,0.5). (30) 

This absolute asymmetry of transfer entropy confirms the dominant direction 
of information flow from X to Y, and not the other way around [Fig. §c)]. 

4. Measuring Information Transfer in Coupled Oscillator Networks 

Having studied the application of transfer entropy in systems of two cou¬ 
pled oscillators, we now turn to networks. 

4-1- Effect of Indirect Influence 

First we explore information transfer under the presence of indirect cou¬ 
plings. Consider a directed linear chain 

Z ->■ Y ->■ X, (31) 
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where Z indirectly influences A" through Y [Fig. [3 J[sl)] . We focus on the 
dynamics of this three-node network according to Eq. ( J20j) , with e € [0.2, 0.4], 
a regime where coupling has a non-negligible effect on the dynamics but not 
strong enough to result in synchronization. 

In Fig. g} a) we plot values of the transfer entropies T x _T y ^ x , and 
T Z ->:x (T X -,y ~ T x ^ z ~ 0 are not plotted). By definition, T X -> x = 0. 
The direct influence of Y on X is validated by the positive values of T y ^ x . 
Interestingly, values T z ^ x are also positive, despite the fact that there is no 
direct coupling from Z to A. Similar results are found for other networks 
that contain the direct linear chain Z —> Y — > X but without the direct 
coupling Z —>• X. See Fig. [3f^b-c) for the other two networks and Fig. |4](c,e) 
for the corresponding results. 

One important implication of these results is that, the use of transfer 
entropy for inferring network structure can be inappropriate under the pres¬ 
ence of indirect influences. Since indirect couplings are common in many 
networks, directed edges that are inferred by measuring transfer entropy can 
often be “false positive”. 

4-2. Causation Entropy 

We note that the key reason transfer entropy often fails in identifying 
indirect couplings from direct ones is that it is a pairwise measure between 
two processes. For example, the transfer entropy T z ^ x shown in Fig. |4](a,c,e) 
does not account for the fact that the observed information transfer from Z 
to X is indeed a consequence of the direct information transfer from Z to Y, 
and then Y to A. 

Here we propose a new measure, which we call causation entropy. The 
causation entropy from Z to X (conditioned on X and Y) is defined as 

Cz^x\(x,Y) = H(X t+1 \X t ,Y t ) - H(X t+1 \X t ,Y t ,Z t ). (32) 

Thus, C Z ->x\(x,y) measures the extra information provided to X by Z in 
addition to the information that is already provided to X by other means. 

For an arbitrary set of processes, causation entropy is defined as follows. 

Definition 1 (Causation Entropy). The causation entropy from process Q 
to process V conditioned on the set of processes S is defined as 


CW|(S) = H(P t+l \S t ) - H(V M \S t , a). 


(33) 


Causation entropy Cq^-p^s) is a generalization of transfer entropy. In 
fact, by letting S = V, we have 

Cq^v\(t) = Tq^v- (34) 

In general, causation entropy Cq^-p\{S) measures the reduction in uncertainty 
in V due to the extra knowledge of Q in addition to that of S. 

If S = 0, we simply write 

Cq^v = Cq^v |( 0 )- (35) 


It follows that 


CW = H(V t+ 1) - H{Vt+i\Qt) = UV t+ 1; Q t ), (36) 

which is the mutual information between Vt +1 and Q t . When S ^ 0, causa¬ 
tion entropy Cq^p^s) can be interpreted as the mutual information shared 
between V t+ \ and Q t conditioned on S t . 

Figure |4|b,d,f) shows that, for the networks in Fig. |3](a-c), both Cx^x 
and Cy^x\(x) are positive, as a result of the influence of X on itself (self- 
dynamics) and the direct influence of Y on X. On the other hand, and 
by design, the causation entropy Cz^x |(x,v) ~ 0, in sharp contrast to the 
positive transfer entropy, T z ^x > 0 [Fig. Ba,c, e)]. The reason C z ^x\(x,y) is 
close to zero is that, the information provided by Z (to A") is merely a subset 
of the information provided by Y. No extra information about A"’s future 
state can be gained by learning the current state of Z if those of X and Y 
are already known. 


4-3. Example: Dominance of neighbors 

Dominance of neighbors refers to a scenario where an oscillator’s future 
state is dominantly determined by the state of its neighboring nodes, rather 
than by itself. In terms of Eq. (20), this occurs when the coupling strength 
e ~ 1. We here explore its effect on information transfer. As an example, 
we consider dynamics by Eq. (20) on the network shown in Fig. [3](d), where 
node X receives input from Y, but not from Z (even indirectly). 

As shown in Fig.[5|a), transfer entropy Ty^x is positive, due to the direct 
influence of Y on X. Surprisingly, transfer entropy T z ^ x is also found to 
be positive, despite the fact that no information flows from Z to A", either 
directly or indirectly. 
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The reason positive transfer entropy T z ^x is found in the absence of 
influence of Z on X is that, T z ^x is taken to be the difference between 
H(X t+ i\X t ) and H(X t +i\X t , Z t ). Here since X t+ \ is dominantly determined 
by Y t and only depends weakly on X t , the conditional entropies 

\H(X M \X t ,Zt)~H(X t+1 \Z,). y ; 

A closer inspection of the network reveals that, under the strong coupling 
regime where the dynamics of an oscillator depends dominantly on its neigh¬ 
bors dynamics, the state of X t+ i depends mostly on Y t (and not X t ). Since 
Y t depends mostly on X t _i by the very same argument, we conclude that 
the mutual information between X t+ i and X t _ ] is high. Similarly, since Z t 
depends mostly on X t -i and Y t - 1 , there is high mutual information between 
Zt and X t _i. Based on this analysis, the mutual information between X t -\ 
and X t should be low and that between Z t and A i+1 should be nonnegilible, 
which is confirmed in Fig. [5](c-d). Although information in the network flows 
directly from X to Z, without accounting for the dominant factors that de¬ 
termine the value of A (+] , one would indeed infer a directed link from Z to 
A" based on the calculation of the transfer entropy T z ^x- 

We note that, because of the dominance of Y on A" (as opposed to A" on 
itself), one should indeed measure the causation entropies Cy^x, Cx->x\(y), 
and C z ^x\{x,y)i respectively. Results are shown in Fig. [sjb)). The value 
Cy^x > 0, as expected. The value Cx->x\(y) ~ 0, due to the dominant 
influence of Y (rather than X itself) on A". The value C z ^x\(x.y) ~ 0 as 
well, suggesting the absence of information transfer from Z to A", which is 
consistent with the structure of the network shown in Fig. [3](d). 

4-4- Iterative Evaluation of Causation Entropy in a Network of N Processes 
The determination of causation entropies (i.e., the order Y,X,Z ) can in 
fact be done a priori, by first choosing the process Qi G {A, Y, Z} that 
maximizes the causation entropy Cg H j, and then iteratively select Qk as 
the process that maximizes Cg fc _ 5 .x|(Qi,...,Q fe _ 1 ) (see the following paragraph 
for details). For the example used in Fig. [5j we found that Q\ = Y , and 
Q 2 = X. Therefore, contrast to transfer entropy, causation entropy can 
successfully identify the dominance of neighbors and in turn avoid erroneous 
inference of couplings due to its effect. 


10 


For a network of N coupled stochastic processes we propose 

to identify the set of causal processes of a given process i by iterative max¬ 
imization of causation entropy. Let n$ = i. We first find process ri\ that 
satisfies 

TH = argrnax^ C x u)^. X d) • (38) 

Then we iteratively seek for rik (k = 2,3,...) that satisfies 

Tik argmaxy^j ^ , xW)->x(‘)|(i'( n o),x("i) 1 . ..,x^ nk ~A)- (39) 

We stop the search at step k when 

C X (n k )^ x (i) < 9, (40) 

where 9 is a preselected tolerance value. The processes ni,ri 2 , ■ ■ ■ ,rik-i (in 
the decreasing order of dominance) form the set of causal processes of i. 

Note that in theory the value of C x (n k )^ X (i) will be exactly zero if the 
dynamics of node does not causal-determine the dynamics of node i. In 
practice, however, the numerical estimation of C x (n k )^ x o) is based on the 
estimation of probability distributions from finite sample, and will be close 
to (but not necessarily equal to) zero for finite number of data points. A 
rigorous way of determining whether the numerically computed causation 
entropy should be identified as zero is to perform a hypothesis test. It can 
be challenging to do such a test in practice and often times one can instead 
use a shuffle test to obtain approximate confidence intervals m 

4-5. Example: Anticipatory couplings 

Our last example is a unidirectionally coupled dynamical system with 
anticipatory coupling ra 

(x t+I = /Or,), 

\yt +1 = (1 - e)/(j/t) + e[(! - + «/ 2 (^t)], 

where f{x) = ax(l — x ) with a = 4, parameter e G [0,1] is the coupling 
strength, and parameter a G [0,1] is the strength of anticipatory coupling. 
Notation f 2 means that the map / is applied twice. 

Here we adopt the concept and notation of transfer entropy to define 

T Xt ->Y t+1 = T x ^y = H(Y t+1 \Y t ) - H(Y t+1 \Y t ,X t ), (42) 
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and 


T Xt+ 1 ^Y t+1 - H(Y t+1 \Y t ) - H(Y t+1 \Y t ,X t+l ). (43) 

Figure [6](a) shows that both T Xt ^Y t+1 and Tx t+l -,Y t+l are positive, with com¬ 
parable values. Does this suggest that both X t and X t+1 independently 
influence Y t+ x ? Standard interpretation (of transfer entropy) would suggest 
that the answer to this question is yes. 

By use of causation entropy, we find that Y t+ \ is primarily determined 
by Y t . The second dominant influence on Y t+] is X t+ \, as confirmed by the 
values of 

C Xt+ 1 ^Y t+ 1 \(Y t ) = H(Y t+1 \Y t ) - H(Y t+l \Y t , X t+1 ). (44) 

It turns out that additional information of X t (beyond Y t and X t+ i ) does not 
contribute to the reduction of uncertainty of Y t+ \. This is validated by the 
causation entropy 


CW, +1 |(Y„x, +1 ) = H(Y t+1 \Y t ,X t+1 ) - H(Y t+1 \Y t ,X t+1 ), 


(45) 


which remain close to zero, as shown in Fig. [6](b). 

Therefore, in contrast to transfer entropy analysis, which would suggest 
that both X t and X t+l participate in the determination of Y) +1 , causation 
entropy analysis reveals that information of X t is indeed completely redun¬ 
dant in inferring Y t+ 1 . In fact, by expressing f(x t ) as x t +\ in Eq. (41), it 
appears that the value of y t+ i depends solely on y t and x t + 1 , and not on x t . 


5. Information Transfer in Time-Dependent Networks 

The effects of time-dependent structures on network dynamics are often 
intriguing and pose considerable challenges for analysis. For example, the 
problem of synchronization stability of coupled oscillators in time-dependent 
networks has been fully addressed only for a few specific cases [321 [331 01 [351 
136] , Here, our focus is to measure information transfer among oscillators that 
are coupled through a time-dependent network structure (that is, a network 
whose edges change in time). In particular, we generalize Eq. ( |20| ) to allow 
for time-dependent interactions in between oscillators, as 

x th = f[ x t } ] +e^2ci j (t)9[xt\x { t :i) ], 1 = 1,2,...,#. (46) 
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Here all terms in Eq. (46) except for Cjj(f) are the same as those in Eq. (20). 
The term Cjj(i) represents the coupling from j to i at time t and explicitly 
accounts for the time-dependent network structure. 

We consider time-dependent networks constructed as follows. Start with 
a baseline static network whose adjacency matrix is C = [c. 


ijjnxn’ 


The edges 

in the network are then allowed to “blink” according to the following rule, 
to generate a time-dependent network: at each time t, 


Cij (f ) 


Cij , with probability p\ 

0, with probability 1 — p. 


(47) 


Therefore, when the blinking probability p — 1, the network is the same as 
the baseline static network; on the other extreme, when p = 0, no edge exists 
and the network becomes empty (i.e., each oscillator is isolated and does 
not couple to other oscillators). For the values of p in between 0 and 1, the 
network structure changes in time in a stochastic fashion (see Fig. [7] for a 
few illustrative examples). 

Our interest lie in the information transfer within such time-dependent 
networks. Different from its static counterpart, the flow of information in a 
time-dependent network often cannot be directly obtained from examining 
the edges Cij(t), because it is possible for a network to be disconnected at all 
times and yet be able to transfer information from one node to another. Such 
scenario has been previously considered in the synchronization of coupled 
oscillators in time-dependent networks with edges being switched on and 
off [34] and in moving-neighbor networks whose edges are defined by the 
local interactions between agents that move in space (33]. In both cases, 
even though the original static network is connected, the corresponding time- 
dependent network obtained by blinking the edges might not be (see Fig. [T] 
for examples). 

The connection between these time-dependent networks and the original 
static network is that the asymptotic temporal average of each directed edge, 
(), is proportional to the weight of the same edge in the static network: 

1 T 

( c ij) = ^ Ci i (*) = ■ ( 48 ) 

We ran numerical simulation on several time-dependent networks and fo¬ 
cus on the information flow measured both by transfer entropy and causation 
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entropy. Figure [8j^a) shows that, for the directed linear chain Z —> Y —>■ X 
with fixed coupling strength e = 0.4, when the blinking probability p in¬ 
creases, the transfer entropy Tz^x becomes increasingly nonnegligible, indi¬ 
cating direct information transfer from Z to A" from standard interpretation. 
On the other hand, the causation entropy Cz^x remains essentially zero, 
suggesting that the information transferred from Z to A" is merely a redun¬ 
dancy of the information that are transferred from Z to Y and Y to A", re¬ 
spectively, possibly at different times. Figure |8](b-c) show similar comparison 
between transfer entropy and causation entropy for meaning the information 
flow in time-dependent networks that originate from the networks shown in 
Fig.gb- c) with the fixed coupling strength e = 0.4. 

The possible misinterpretation of transfer entropy becomes more evident 
under the dominance of neighbors scenario, where the coupling strength e 
is close to 1. As shown in Fig. [8j^d) , under such scenario, transfer entropy 
identifies a strong information transfer from Z to X whereas in the average 
network of the time-dependent network, it is the exact opposite. Causation 
entropy, on the other hand, successfully identifies the dominant nodes that 
influence the dynamics of A", namely, its neighbor Y and then X itself. 

6. Discussion and Conclusion 

Our main message here is that while being an essential problem in science 
in general, and dynamical systems in particular, the question of what is cause 
and what is influence in complex system analysis is challenging, not due to the 
lack of methodology, but rather due to the lack of clear and comprehensible 
understanding of the applicability of proposed methods, in particular when 
the underlying system involves complex interactions. The popular concept of 
transfer entropy has been used lately to serve as a way of inferring causality, 
without much understanding about its domain of success. 

We here explored information flow measured from the dynamics of small- 
scale coupled oscillators network, attempting to gain insights into the valid¬ 
ity of transfer entropy as well as its limitations. For two coupled oscillators, 
transfer entropy is found to successfully detect the directionality of informa¬ 
tion flow, even in cases where the couplings are blinking (time-dependent). 
However, its validity breaks down under the presence of indirect couplings, 
dominance of neighboring dynamics, and anticipatory couplings, which are 
common in large-scale complex systems. 
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To overcome the limitations of transfer entropy, we introduced a new 
measure of information flow called causation entropy, which is designed to 
allow inference of causation despite the presence of primary and secondary 
influences between elements of a larger coupled system. We highlighted the 
success of our approach with several examples where specifically the transfer 
entropy cannot distinguish between causation and independence but causa¬ 
tion entropy successfully infers the true causal relationships. 

Given the recent advancements in estimating transfer entropy in rather 
general settings including multivariate time series and infinite time delay m 
EE! > ^ is our hope to build on the idea of causation entropy to explore informa¬ 
tion flow and coupling inference in larger-scale systems, which are important 
for a wide range of applications across scientific fields. One challenge is that, 
for large-scale systems, naive binning methods would require an exponential 
number of data points with respect to the number of variables, in order to reli¬ 
ably calculating entropies (including joint entropy, transfer entropy, and also 
causation entropy). Nonparametric density estimation methods previously 
developed for mutual information P [39] are likely to offer a route towards the 
reliable estimation of causation entropies in large-scale dynamical systems. 
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Figure 1: Venn-like diagrams for information-theoretical measures, (a) Relations between: 
entropies H(X) and H(Y), joint entropy H(X,Y ), conditional entropies H(X\Y) and 
H(Y\X), and mutual information I(X;Y), of two random variables X and Y. (b) Rela¬ 
tions between: transfer entropy Ty->x, entropies of random variables Xt+i, X tl and Y t , 
and their joint and conditional entropies. The transfer entropy is the difference between 
the conditional entropies H(X t +i\X t ,Y t ) and H(X t+ i\X t ), which measures the extra in¬ 
formation provided by Y t (in addition to X t ) in the determination of X t+ i. 
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Figure 2: Measuring information flow in two coupled logistic maps, (a) Dependence of 
mutual information Ix-y = I(X\ Y) and transfer entropies Tx^y and Ty^,x on coupling 
strength e for two bidirectionally coupled oscillators. Synchronization occurs when e G 
(0.25, 0.75), which is the same region where the mutual information reaches its maximum. 
Due to the symmetry of coupling, Tx~>y = Ty^x- (b) Typical time series for two 
bidirectionally coupled oscillators, with e = 0.1 (top, unsynchronized trajectories) and 
e = 0.6 (bottom, synchronized trajectories), (c-d) Same as (a-b), but for two oscillators 
with unidirectional coupling from oscillator X to oscillator Y. In this case, synchronization 
appears when e € [0.5,1). For e € (0, 0.5), T x ^,y Ty^ x , indicating that the dominant 
direction of information flow between X and Y is from X to Y. In all simulations of 
the paper, we generate trajectories of length 10 5 and discard the initial 5% segments for 
all information measures. The interval [0,1] is divided evenly into 2 4 subintervals for the 
estimation of discrete probabilities. In our simulations, we made the choice of 2 4 based 
on the balance between the length of the time series and the number of variables in the 
joint distribution: too few subintervals will only reveal limited information about the 
true dynamics and on the other hand, too many of them will lead to statistical under¬ 
sampling EZIEEJ Note that this problem of finding an appropriate number of subintervals 
for the estimation of entropy is analogous to the problem of finding an appropriate number 
of bins to construct a histogram, for which no “best” solution exists in general. 
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Figure 3: Small-scale directed binary networks, (a-c) Networks with a directed linear 
chain Z —> Y —> X, but no direct coupling Z —> X. (d) A network that contains a direct 
coupling X —> Z, but not Z —> X. 
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(a) ^ T X^X^ T Y^X+ T Z^X (b) -0- C x^X -n- C Y^X ♦ c z^x 



Figure 4: Causation entropy and transfer entropy for the identification of indirect 
coupling. (a-b) Transfer entropies {Tx^x, Ty^-x, Tz^x} and causation entropies 
{Cx^Xi @y->x\(x)i Cz^>-x\(x,Y)} f° r the network shown in Fig. 3](a) with dynamics (20). 
Note that the transfer entropy Tz^x is positive despite the absence of direct coupling 
from Z to X. On the other hand, the causation entropy Cz^x\(x,y) ~ 0, since informa¬ 
tion that are being indirected transferred from Z to X all go through Y. (c-d) Same as 
(a-b), for the network in Fig. [3](b). (e-f) Same as (a-b), for the network in Fig. [3](c). 
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Figure 5: Causation entropy versus transfer entropy under the dominance of neigh¬ 
bors. (a) Transfer entropies {Tx^-x, Tz^x}- (b) Causation entropies 

{Cx - 4 A'|(F)i Cr-tYj C^xKA'.y)} for the network in Fig. pud) whose dynamics follow 
Eq. ( pOl ). (c) Scatter plot between X t -\ and X t for e = 1. (d) Scatter plot between 
Z t -1 and X t for e = 1. In (c) and (d), points are taken from a randomly select trajectory 
segment (of the full trajectory) of length 1000. 
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Figure 6: Causation entropy versus transfer entropy under anticipatory coupling, (a) 
Transfer entropies Tx t ^Y t+1 and Tx t+1 ^Y t+1 - (b) Causation entropies Cx t+1 ^Y t+1 \(Y t ) 

and Cx t -tY t+1 \(Y t ,x t+ i)- Here parameter e = 0.3. 
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Figure 7: Examples of time-dependent networks. First (leftmost) column: structure of 
static networks (the same as those in Fig. [3]). Second to the last (rightmost) columns: 
typical network structures at different times, obtained from keeping each directed edge of 
the static network independently with probability p = 0.5 at each time t. 
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Figure 8: Causation entropy versus transfer entropy for time-dependent net¬ 

works. (a-b) Transfer entropies {Tx_>Xj Ty^x, Tz^-x} and causation entropies 
{Cx- s-.Y) Cy^x\(x), Cz->x\(x,Y)} for the time-dependent network originates from the net¬ 
work in Fig. [3^a) via Eq. ( |47| and endowed with dynamics ( |46| ), for the fixed coupling 
strength e = 0.4. (c-f) Same as (a-b), for the networks in Fig.]3[b) and Fig. [3^c), respec¬ 
tively, and e = 0.4. (g-h) Same as (a-b), for the network in Fig. [3j(d) and e = 0.9. 
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